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Abstract 

This paper describes the development and use of a lexical semantic database for the 
Verbmobil speech-to-speech machine translation system. The motivation is to provide 
a common information source for the distributed development of the semantics, transfer 
and semantic evaluation modules and to store lexical semantic information application- 
independently. 

The database is organized around a set of abstract semantic classes and has been 
used to define the semantic contributions of the lemmata in the vocabulary of the sys- 
tem, to automatically create semantic lexica and to check the correctness of the semantic 
representations built up. The semantic classes are modelled using an inheritance hier- 
archy. The database is implemented using the lexicon formalism LgV4 developed during 
the project. 
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1 Introduction 



The distributed development of the modules of a large natural language processing system 
at different sites makes interface definitions a vital issue. It becomes even more urgent when 
several modules with the same intended functionality are developed in parallel and should be 
indistinguishable with respect to their input-output-behaviour. 

Another important issue is the acquisition and maintenance of lexical information which 
should be stored independently of an application to make it (re) usable for different purposes. 

This paper describes the design and use of the Verbmobil Semantic Database which we 
developed in order to deal with these issues in the area of lexical semantics in Verbmobil. 



2 The Verbmobil Project 



The Verbmobil project^] ([Wahlster 1993| ; |Bos et al. 1996|) aims at the development of a speech- 



to-speech machine translation system for face-to-face appointment scheduling dialogues. 

The application scenario of Verbmobil is that a speaker of German and a speaker of 
Japanese try to schedule an appointment. They communicate mostly in English, which they 
understand better than they speak it. If they they want to say something they cannot express 
in English, they can have the Verbmobil system translate from both their native languages 
to English. 

The system is being developed by about 30 partners from academia and industry in Ger- 
many, the United States and Japan. A first version, the Demonstrator, was completed in 
early 1994; for autumn 1996 the release of the Research Prototype is scheduled, which marks 
the end of the first project phase. A second phase is expected to start in 1997. 

Verbmobil employs a semantic transfer approach to translation ( Dorna and Emele 1996| ), 



i. e. an input utterance is syntactically analyzed, a semantic representation of the content is 
built up,0 and this source language semantic representation is mapped to a target language 
semantic representation by the transfer module. This representation is the input for the target 
language generation. Additionally, a dialogue processing module and a semantic evaluation 
module keep track of the discourse and answer disambiguation queries. (The relevant part of 
the system architecture is shown in figure [I].) 



information about Verbmobil, such as available reports, can be retrieved via the World Wide Web: 
http : //www. df ki . uni-sb . de/verbmobil/. 

^Syntactic and semantic analysis proceed in parallel in the Research Prototype, while they were two con- 
sequent processing steps in the Demonstrator. 



2 



SynSem 




VIT 




Generation 



Semantic Evaluation 



Figure 1: The relevant part of the Verbmobil architecture (simplified) 

3 Motivation and Goals for the Semantic Database 

The architecture of Verbmobil makes it necessary for the semantics, transfer, semantic eval- 
uation and generation modules to agree on the format and contents of the semantic repre- 
sentations they exchange. E. g. the developers of the transfer module need to know how the 
semantics of the different lemmata in the vocabulary is represented in the structures produced 
by the syntax-semantics module (SynSem for short), i. e. which predicates and structures 
they have to map to the target language. On the other hand, semantics need to know which 
readings have to be distinguished by transfer in order to arrive at correct translations. 

This need for information becomes even more urgent when, like in Verbmobil, there are 
several SynSem modules (two for German, one for Japanese), which have to produce compat- 
ible output, and the different modules are developed independently and in parallel by several 
partners at different sites.0 

As a frame for the exchange of semantic representations a common format, the Verbmobil 
Interface Term, VIT for short, has been defined ( [Bos, Egg, and Schiehlen 1996| ). The VIT is 
the central data structure used at the interfaces between the language modules of Verbmobil. 
A VIT is a ten-place term with slots for an utterance identifier, a list of labelled semantic 
predicates, a pointer to the most prominent predicate, sortal, anaphoric and syntactic infor- 
mation, temporal and aspectual properties, scope relations and prosodic features. Figure |2] 
shows a VIT for the sentence Wir machen einen Termin aus (We arrange an appointment). 

A VIT is an under specified representation for a set of discourse representation structures 
( [Kamp and Reyle 1993Q in which the scope of operators is not fixed yet. In the example 
shown in figure [2] both the scope of the declarative sentence mood operator, decl/2, and of 
the quantifier/indefinite, ein_card_qua/5, are left unspecified. They introduce holes, written 
as hi and h2, as their scope, which can be plugged by structures subordinated to them by 
means of less or equal constraints, written as leq/2. Different ways of plugging the holes result 
in different readings. In addition to the leq/2 constraints determining all possible readings, 



3 In the following, we concentrate on the Semantic Database for German. The Japanese version follows the 
same principles. 
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vit( segment_description(ttestr4ul , yes, 

[termin(16, i2) , % 

ausmachen ( 14 , i 1 ) , 

decl(15,hl) , 

argl(14,il,i3) , 

arg3(14,il,i2) , 

ein_card_qua(13,i2,ll,h2,l) , 

pron(19,i3)] , 
15, I 
[s_sort (il ,ment_communicat_poly) , % 

s_sort (i2 , & (space_time , time_sit_poly) ) , 

s_sort(i3,& (human, per son))] , 
[prontype(i3,sp_he,std)] , °/„ 
[num(i3,pl) , °/ 

pers(i3, 1) , 

gend(i2,masc) , 

num(i2,sg) , 

pers(i2,3) , 

cas(i2,acc) , 

cas(i3,nom)] , 

[ta_mood(il , ind) , °/„ 
ta_tense(il ,pres)] , 

[ccom_plug(h2 , 12) , °/„ 
ccom_plug(hl , 13) , 
leq(12,h2) , 
leq(12,hl) , 
leq(13,hl)] , 

[pros_mood(15,decl)] , °/„ 
[sem_group(12, [14]) , % 
sem_group(ll , [16])] 



'wir machen einen termin aus'), 

Semantics 



Main Label 
Sorts 



Discourse 
Syntax 



Tense and Aspect 
Scope 



Prosody 
Groupings 



Figure 2: A VIT for Wir machen einen Termin aus ("We arrange an appointment"). 
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we supply a default scoping based on syntactic structure in the predicates ccom_plug/2.0 

All semantic predicates in the VIT are labelled (their first argument is the label). This 
allows us to group several predicates together (using the sem_group/2 predicate) and form 
complex substructures which can occur in the scope of operators. 

Apart from the purely semantic information mentioned so far, a VIT contains sortal con- 
straints associated with discourse markers, discourse information about anaphoric elements, 
syntactic agreement and tense information. Since Verbmobil deals with spoken input, we also 
represent prosodic information in the VTT.0 

What is needed then in addition to the VIT data structure definition is a definition of the 
VIT's contents, for each lemma in the vocabulary of the system a definition of the semantic 
predicates and other types of information, e g. sortal restrictions, it introduces in the slots 
of the VIT. E. g. for the verb ausmachen in the example above, we need to specify that 
it introduces a predicate ausmachen (LI , II) together with argument roles argl (LI , II , 12) 
and arg3(Ll , II , 13) in the semantics slot and sort (II ,ment_communicat_poly) in the sorts 
slot. 

If a source providing this kind of information to the developers of the separate modules 
is available, the modules which deliver (the two SynSem modules) or process (especially the 
transfer module) VITs conforming to this definition can be developed in parallel. It would 
also be desirable to use this information source directly in the construction of the linguistic 
knowledge bases to guarantee consistency between the output and the specifications. 

To meet these goals, we have developed the Verbmobil Semantic Database, which we will 
describe in the remainder of this paper. 



4 Design and Implementation of the Database 

The semantic database is organized around a set of abstract semantic classes (|Bos, Egg, and| 
Schiehlen 1996| ), which are used to classify the lemmata in the vocabulary. It is implemented 
using the lexicon formalism Cgfy. 



4.1 Semantic Classes 

The semantic classes in use are originally based on a morpho-syntactic classification of the 
words in the vocabulary of the system which has been refined to account for the semantic 
properties. This has been decided upon, because words of a certain word-class usually have 
the same semantic properties. In the example given below, it is shown that transitive verbs 
all need an instance and two arguments with their semantic/thematic roles. 
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For more details on this underspecified approach to semantics, the reader might consult (Bos 1995: Bos 



et al. 1996| ). 

b Thc VIT in figure has been generated from typed input and thus contains no real prosodic information. 
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Class 


PredScheme 


Example 


transitive_verb 


R(L,I), argX(L,I,Il), argY(L,I,I2) 


treff'en 


common_noun 


R(L,I) 


Termin 


det_quant 


R(L,I,H) 


jeder 


demonstrative 


demonstrative (L, I ,L1) 


dieser 


wh_question 


whq(L,I,H), tloc(L2,I2,Il) , time (LI , 11) 


wann 



Table 1: A few examples of semantic classes 



For each semantic class a representation scheme, called the predscheme, has been defined, 
which specifies the predicates together with their arity and arguments appearing in a VIT for 
instances of the class. 

As an example consider the class transitive_verb. A transitive verb is represented as 
R(L,I), argX(L, I , II) , argY(L, I , I2)|] I. e., it introduces some relation R and two the- 
matic roles (I is the event variable, L a label used to refer to the verb's semantic contribution, 
and II and 12 are the instances filling the roles). The verb's relation and the thematic roles 
it assigns have to be defined for each verb in the database. Cf. table [j] for further examples 
of semantic classes together with their predschemes. 



4.2 The Lexicon Formalism CgC4 

The semantic database makes use of the lexicon formalism C$4 developed in the course of 
the Verbmobil project ( pebhardi and Hcincckc 1995a ; Gcbhardi 1996 ). 

The Lexicon Formalism C$4 has been used since summer 1994 within Verbmobil's lexicon 
group. It is based on feature-structures (permitting disjunction and negation) embedded in 
an inheritance hierarchy of classes. 

In C$4 the task of constructing a lexicon is split up into four parts: 

1. Modelling the lexicon (i.e. its linguistic classes), 

2. data-acquisition (can be done at the same time by different contributors), 

3. definition of the application-interface (data can be compiled into every format needed 
after being processed by the /g$-machine) , and 

4. efficient storage. 

Modelling a lexicon involves defining classes, their appropriate features, and inheritance 
relations between classes. Examples for defining classes will be given below in section EO 



3 X and Y stand for the values {1,2,3}, since argl , arg2 , arg3 are the thematic roles used in Verbmobil. 
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appropriateness of features is dealt with in the remainder of this section. For data acquisition, 
a graphical acquisition tool has been implemented ( |Heinecke 1996|) . How the application 
interface is used in the context of the semantic database will be shown in section [|. Part 
of the application interface is the /£%^-Trafo which outputs the stored information in any 
format required. A database system for efficient storage has been developed flKruschwitz and 
( L4ebhardi 19961 ) 



Among other formalism constructs, the possible values of a feature can be specified in two 
ways. If there is no restriction on the value of a feature, it is assigned the most general value 
keyword (top): 

predname : top . 

Otherwise, the formalism allows to define the appropriateness conditions of a feature, using 
disjunctions to specify the appropriate values as in the following example (the underlined 
values are the appropriate ones which can be assigned to the feature sort_of _inst): 

sort_of _inst : ( abstract \ anything \ communicat_result_poly \ 
communicat_sit \ person ) . 

For constructing morphological lexica, inflection or lexical rules can easily be implemented 
to generate multiple instances of a single entry flGcbhardi and Hcinccke 1995b ; Hcinccke and 
( L4ebhardi 19951 ). 



Database entries, called bases, are instances of a class. Consequently, they assign values 
to the features they inherit from their class which are not yet fully specified by the class 
definition. For a verb's base, e. g., one has to specify its predicate name, thematic roles, the 
sort of its instance, etc. 



4.3 Semantic Classes and their Representation in 

The abstract semantic classes of section have been modelled in the lexicon formalism C^4 
along the following lines. 

Firstly, a general superclass semdb_c is defined from which all classes inherit features for 
the lemma, the main predicate's name, the part of speech etc. The individual subclasses 
corresponding to the abstract semantic classes additionally introduce a specific predscheme 
for each predicate associated with words of this class and features for sortal information, 
thematic roles etc. 



class semdb_c :< top >: °/ - Main class from which 

% all classes inherit. 

syntax_link: top & % - Link to syntactic lexicon, 

predname : top & °/ - Name of the semantic predicate. 
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Figure 3: Part of the class hierarchy 



lemma: top & % - Lemma of the entry, 

pos: top . % - Part of Speech of the occurrences 

% in the corpora. 

While the abstract semantic classes are not hierarchically organized, their modelling in CgCi 
makes use of a hierarchy to capture generalizations. For instance, we integrate all properties 
the verb classes have in common and place them in an abstract verb class verb_c from which 
all verb classes, e. g. transitive_c, inherit, cf. figure |3| (classes corresponding to semantic 
classes are shown in boldface) and below. 

class verb_c :< semdb_c >: °/„ - All verbal classes inherit this. 

sort_of _inst : top . °/„ - Sort of eventuality. 



class transitive_c :< verb_c >: % 

semclass: transitive_verb & % 

predscheme: 'L,I' & % 

% 

predscheme_al : 'L,I,I1' & % 

predscheme_a2 : 'L,I,I2' & % 

role_al: (argl \ arg2 \ arg3) & °/ 

role_a2: (argl \ arg2 \ arg3) . % 



As a second example, consider the followin; 
stract semantic class common_noun: 



- Transitive verbs 

- Semantic class. 

- PredScheme for the PredName 
of all transitive verbs. 

- PredScheme for the first 
and the second argument. 

- Thematic roles of the arguments 
of the verb (restricted 

to three valid values). 

I definition for the C$4 equivalent of the ab- 



class common_noun_c :< semdb_c >: % - Standard nouns 

predscheme: 'L,I' & % - PredScheme for standard nouns. 

sort_of _inst : top & % - Sort of instance, 

semclass : common_noun . % - Semantic class. 
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4.4 Representation of Lemmata 

A base for a lemma consists of its classification together with its idiosyncratic properties in 
terms of feature values; it inherits the feature values which are specified in the definition of 
the class. Among the idiosyncratic information we have predicate names, sortal restrictions 
etc. Thus an entry inherits the predscheme from the class, while the concrete predicate name 
in the predscheme is defined in the entry itself 

base 'Termin' :« common_noun_c >>: 



pos: 'NN' & 

lemma: ' Termin ' & 

syntax_link: ' termin ' & 

predname: 'termin' & 

sort_of _inst : 'time_sit_poly' . 

base 'ausmachen' :<< transitive_c »: 



pos: 'VVFINjVVINF' & 
lemma: 'ausmachen' & 
syntax_link: 'ausmachen' & 
predname: 'ausmachen' & 
sort_of _inst : (communicat_sit \ mental_sit) & 
role_al: 'argl' & 
role_a2: 'arg3' . 

When processing the class definitions and the bases, the /£%^-machine will calculate all 
instances from the specifications and expand the base accordingly. 

5 Application of the Semantic Database 

The Semantic Database is currently being used for creating the semantic lexica of the syntactic- 
semantic modules of Verbmobil, for producing a table of lemmata with the predicates and 
other types of information they introduce in a VIT and for checking the correctness of the 
generated interface terms automatically; it can also be accessed via the World Wide Web. 

A similar procedure is used to generate the semantic lexicon etc. for the Japanese 
syntactic-semantic module of Verbmobil ( |Mori 199TS ). 



% - The entry 'Termin' 
% inherits its structure from 
% from the class 'common_noun_c' 

% - Further individual 
% specification for 
% the current entry. 



7o - The entry 'ausmachen' 

% inherits its structure from 

% the class 'transitive_c'. 

% - Further specifications. 
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5.1 Creation of the Semantic Lexicon 

Consider the compilation of the semantic lexicon from the database for the German SynSem 
module SynSemS3.[] To guarantee consistency between the output of the SynSem module 
and the specifications in the database, the semantic lexicon is generated out of the semantic 
database. 

After the /£%^-machine has processed the entries and expanded them according to the 
class definitions, the /^^-Trafo compiles the output into the format required for the 
semantic lexicon. 



sortl_traf o (Base , Class, 
[ predname : Predn , 
syntax_link : SI , 
sort_of _inst : Si , 
usb_macro :M 
] ) => 

fmt("sem_lex(Cat, ~w) short_for~n 
[SI, M, Predn, Si] , [] ) . 



% - Default rule for entries 
°/ with one sort. 



"w(Cat, ~w, (~w)) 



trans_traf o (Base , Class, % - Rule for bivalent verbs. 

[ predname : Pn , 
syntax_link : SI , 
sort_of _inst : Si , 
role_al :R1 , 
role_a2:R2, 
usb_macro :M 
] ) => 

fmt("sem_lex(Cat, ~w) short_for~n ~w(Cat, ~w, (~w) , [~w,~w]) .~n" 
[SI, M, Pn, Si, R1,R2] , [] ) . 

The two examples above appear in the semantic lexicon as: 

sem_lex(Cat , termin) short_for 

common_noun_sem(Cat , termin, (time_sit_poly) ) . 
sem_lex(Cat, ausmachen) short_for 

trans_verb_sem(Cat , ausmachen, (communicat_sit ;mental_sit) , 
[argl,arg3]) . 



7 SynSemS3 is the syntactic-semantic module developed by Siemens AG (syntax), University of the 
Saarland and University of Stuttgart (se man tics). The other SynSem module developed by IBM Germany 
makes use of the table output (cf. section |5.2| ) of the database to create a semantic lexicon. 
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The syntactic lexicon contains calls to the macro sem_lex/2 which is expanded in the 
semantic lexicon as shown above. The mapping from syntactic to semantic lexical en- 
tries is achieved via the second argument of sem_lex/2, which originates from the feature 
syntax_link in the semantic database.^ 

5.2 Table— based Representation 

Apart from compiling out semantic lexica, we generate a table of lemmata together with their 
semantic representations and additional information out of the database by using a different 
set of transformation rules for /J?$-Trafo. This table is used by the transfer developers as 
a basis for writing transfer rules and as an information source for the automatic correctness 
check on VIT representations. 

trans it ive_traf o(Base, Class, % - Rule for bivalent verbs. 

[ lemma :Lm, 
pos:Pos, 
semclass : Seme , 
predname :Pn, 
predscheme:Ps, 
predscheme_al :Psl , 
predscheme_a2 : Ps2 , 
role_al :Ral , 
role_a2:Ra2, 
sort_of _inst : Si , 
inst_link: II, 
sort_al:Sal, al_link:All, 
sort_a2:Sa2, a2_link:A12 
] ) => 

fmt("~w ~u ~u ~w,~w,~w ~w ~w(~w) , ~w(~w) , ~w(~w) ~w/~w, ~w/~w, ~w/~w - -~n" , 
[ Base, Lm, Pos, Pn,Ral,Ra2, Seme, Pn,Ps, Ral.Psl, Ra2,Ps2, 
Il,Si,All,Sal,A12,Sa2] , [] ) . 



def ault_psl_instl (Base , Class, 
[ lemma :Lm, 
pos:Pos, 
semclass : Seme , 
predname :Pn, 
predscheme:Ps, 
sort_of _inst : Si 



% - Default rule for entries with 
% one PredScheme and one Sort 
°/ (used e.g. by 'common_noun'). 



i The first argument of sem_lex/2 ranges over entry nodes of the feature structures of the lexical entry. 
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] ) => 

fmt("~w ~w ~w ~w ~w ~w(~w) ~w - -~n", 

[ Base, Lm, Pos, Pn, Seme, Pn,Ps, Si], [] ) . 

In the table output the two examples above appear as: 

Termin Termin NN termin common_noun termin(L, I) I/time_sit_poly - - 
ausmachen ausmachen VVFIN;VVINF ausmachen, argl , arg3 transitive_verb ... 

ausmachen(L, I) ,argl(L,I,Il) ,arg3(L, I , 12) Il/communicat_sit ;mental_sit - - 

In general the concept of TRAFO is trying to map the output of the ^t^-machine onto 
the first matching rule in the rule system. Thus only a few class specific rules are necessary, 
default rules will cover the entries of the majority of the classes to be transformed. 

6 Summary 

We have successfully used the semantic database to deal with about 2000 German and 150 
Japanese lemmata for version 1.0 of the Research Prototype in the way described, especially 
to generate semantic lexica for the German syntax-semantics module SynSemS3, and the 
Japanese one developed by DFKI Saarbriicken and the University of the Saarland. 

The use of the semantic database by both the semantics module and the transfer module 
guarantees consistency between the representations produced by the semantics module and 
the expectations of the transfer module, while both can be developed in parallel. 
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